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Method and arrangement for reducing the volume or rate of an encoded digital video 
bitstream 

The invention concerns in general the technological field of processing digital video 
signals. Especially the invention concerns the technology of reducing the volume or rate of 
5 a bitstream that carries an encoded digital video signal. The volume of a bitstream refers 
generally to the number of bits involved, and the rate of a bitstream refers generally to the 
number of bits per second which is required to transmit the bitstream between two 
locations. 

The conmion way of producing a digital representation of an image is to convert the 
10 generally continuous image plane into a map of tightly spaced elementary picture units 
called pixels, and to give each pixel a value or a group of values that represent its color, 
^ brightness and/or other visual characteristics. A raw digital video signal is an essentially 

3 continuous stream of subsequent still images where the pixels of each image are represented 

Ol by their digital values. The volume of such a bit stream depends heavily on the applied 

15 resolution and tends to be relatively large. Various video compression methods have been 
J1 presented for encoding the digital video bitstream into a compressed form for easy 

J transportation and storing. In the following we will briefly recapitulate some main features 

of the known MPEG-2 video compression and decompression method, where the acronym 

1 comes firom Motion Picture Experts Group. 

2 20 The main part of MPEG-2 type encoding of a digital image consists of dividing the image 

3 into blocks of 8 x 8 pixels, applying a two-dimensional DCT or discrete cosine transform to 
- each block to convert the spatial frequency content of the block into a series of DCT 

coefficients, weighting and quantizing the DCT coefficients by a certain quantization 
matrix, applying a VLC or variable length coding scheme to compact the representation of 

25 the weighted and quantized DCT coefficients and packetizing the result together with a 
certain amount of additional information into certain standardized data structures for 
transportation and/or storing. An MPEG-2 decoder takes the bit stream consisting of such 
standardized data structures and reconstructs the pixel values of the images by decoding the 
VLC, dequantizing the groups of DCT coefficients that describe each block and applying an 

30 inverse DCT to restore the original spatial frequency content of the block. The decoded 
digital video signal which is composed from the decoded blocks may then be conducted for 
example to a displaying apparatus. 

A number of modifications to the above-listed block-level operations take place according 
to whether the block under consideration belongs to an I-picture, a P-picture or a B-picture. 
35 Of these an I-picture or intra-coded picture is an independently coded picture which is also 
decodable without reference to other pictures, a P-picture or predicted picture comprises 
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some references to a former I- or P-picture, and a B-picture or bi-directionally coded picture 
may refer to either a former or an oncoming I- or B-picture or to both a former and an 
oncoming I- or B-picture. Here the terms "former" and "oncoming" refer to the displaying 
order of the pictures and not their transmission order which may be different. P- and B- 
5 pictures alternate in the sequence of pictures according to a set of predefined rules. 

Fig, 1 is a block diagram of a known MPEG-2 encoder. The sequence of picture fi-ames is 
input at point 101 to a preprocessing and fi-ame reordering block 102 the output of which is 
coupled through a selection switch 103 to the input of a DCT encoder 104. One of the 
branches selectable with switch 103 comprises a subtraction unit 105. From the output of 

10 the DCT encoder 104 there is a series coimection of a quantization block 106, a VLC 
encoder 107 and a transmission buffer 108 to the output 109 of the whole MPEG-2 encoder. 
From the output of the preprocessing and fiame reordering block 102 and from the 
transmission buffer 108 there are connections to a bit rate control unit 110, the output of 
which controls the operation of the quantization block 106. From the output of the 

15 quantization block 106 there is also a series connection of an inverse quantization block 
1 1 1, an inverse DCT block 1 12 and an addition unit 1 13 to a double switch 1 14 which is 
arranged to couple the output of the addition unit 113 to the input of either a first fi-ame 
memory 1 15 or a second fi-ame memory 116. The outputs of the frame memories 115 and 
1 16 are coupled both to a motion compensation block 117 and a motion estimation block. 

20 The former provides the other input signal to both the subtraction unit 105 and the addition 
unit 113. The motion estimation block gets an additional input from the output of the 
preprocessing and frame reordering block 102, and it provides motion vectors to both the 
motion compensation block 117 and the VLC encoder 107. 

Fig. 2 is a block diagram of a known MPEG-2 decoder. From the input 201 of the decoder 
25 there is a series connection of a receiving buffer 202, a VLC decoder 203, an inverse 
quantization block 204 and an inverse DCT block 205 to tiie first input of an addition unit 
206. A first three-state switch 207 couples the output of die addition unit 206 alternately to 
one of tiie first 208, second 209 or third 210 frame memories. A second three-state switch 
211 couples alternately the output of one of the first 208, second 209 or third 210 frame 
30 memories to the output 212 of the whole decoder. From the VLC decoder 203 tiiere is a 
connection to a motion compensation block 213 for providing the motion vectors extracted 
from the received signal. The otiier inputs to tiie motion compensation block 213 come 
from tiie outputs of tiie second 209 and tiiird 210 frame memories. The output of tiie motion 
compensation block 213 is coupled to tiie otiier input of tiie addition unit 206 tiirough a 
35 switch 214. 

The compressed MPEG-2 video signal produced at tiie output of tiie encoder of Fig. 1 is 
arranged according to a six-layer hierarchy which is illustrated in Fig. 3. The highest level is 
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the sequence layer on which the exemplary signal of Fig. 3 comprises three concatenated 
video sequences. Each video sequence starts with a header section with a sequence starting 
code, a sequence header and a sequence extension part. The header section may be repeated 
at arbitrary parts of the video sequence. The end of the video sequence is marked with a 
5 sequence end code. 

The second highest level is the GOP or group of pictures level, where a GOP typically 
contains exactly one I-picture and an arbitrary nimiber of P- and B-pictures. Within the 
video sequence each GOP starts with a GOP starting code and a GOP header, which are 
followed by the picture data portion of the GOP. On the picture layer we see that within the 
10 picture data portion of the GOP each picture starts with a picture starting code and a picture 
header v^th an additional extension part. These are followed by the actual picture data. It 
should be noted that while only one P-picture and one B-picture are explicitly shown on the 
picture layer of Fig. 3, typical GOPs may comprise 1 to 4 P-pictures and 1 to 10 B-pictures. 

On the slice layer the actual picture data is seen to consist of a multiple of slices. Each slice 
15 begins with a slice starting code and a slice header, which are followed by at least one 
macroblock. On the macroblock layer the macroblock is seen to consist of a set of 
macroblock attributes, a set of motion vectors and a group of blocks. The number of blocks 
in each macroblock is fixed so that there are four luminance blocks, one U chrominance 
block and one V chrominance block. The chrominance resolution is half of the luminance 
20 resolution in both horizontal and vertical directions which means that the spatial coverage 
of the U and V chrominance blocks in the macroblock is the same as the combined spatial 
coverage of the four luminance blocks. On the block layer each block is seen to consist of 
the DCT coefficients of the block followed by a block end code. 

Let us examine some phases of the generation of the signal shown in Fig. 3 by the encoder 
25 of Fig. 1 in more detail. The DCT encoder 104 takes one block of 8 x 8 pixels at a time and 
calculates a two-dimensional discrete cosine transform which results in 64 coefficients that 
describe the spatial frequency content of the block. One of the coefficients (the first one in 
the common mathematical representation) is the so-called DC coefficient which is 
proportional to the average value of the pixels of the block. The rest of the coefficients are 
30 known as the AC coefficients. It is conventional to represent the coefficients in a 8 x 8 
matrix form where the DC coefficient is in the upper left comer. The AC coefficients are 
located in the matrix so that the distance of each coefficient from the upper left comer is 
proportional to the frequency represented by that coefficient: the most distant coefficients 
represent the highest spatial frequencies. Additionally the direction of a fictitious line drawn 
35 between the location of the coefficient and the upper left comer coincides with the direction 
the spatial frequency into which the coefficient represents. 
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The 8x8 matrix of DCT coefficients for each block is not transmitted as such, but in a 
weighted, quantized and variable length coded (VLC) form. Weighting means that each 
element in the DCT coefficient matrix is divided by the corresponding element in a 8 x 8 
weighting matrix. Quantization and VLC encoding may then be understood as rounding 
5 each quotient into the nearest integer and providing a codeword representation for the 
results: each roimded quotient is mapped into a codeword that unequivocally indicates both 
the value of the rounded quotient and the nxmiber of eventually occurring zeroes between 
that quotient and the previous non-zero quotient when the quotients are read from the 8x8 
matrix in the predefined zigzag form illustrated by line 401 in Fig. 4. The coding of runs of 
10 subsequent constant values into code words instead of transmitting the values explicitly is 
also known as run length encoding. 

The natural form of the quantization matrix is such that its elements tend to have the larger 
values the farther they are from the upper left comer. As a result, in most weighted 
Q coefficient matrices there is a certain last non-zero quotient after which the rest of the 

S 15 quotients (when read in said zigzag form) are so small that rounding them into the nearest 
r I integer produces all zeros. The relative amoimt of pictorial activity in the pictures to be 

%| encoded may be coxmterbalanced by selecting a suitable weighting matrix: when the values 

of the elements in the weighting matrix increases steeply, the relative size of the all zeros 
m part of the weighted and quantized coefficient matrix increases, which together with the 

^ 20 run-length encoding mentioned above means less bits produced per block. Naturally the 
S weighting and quantization operation causes loss of pictorial information, so from the 

M viewpoint of reproducable picture quality it is advantageous to keep the "zeroing" effect of 

~ weighting and quantization as low as possible as long as the volume or rate of the produced 

p bit stream is within predefined limits. The weighting matrices can be different for each 

25 picture, meaning that each picture header part seen on the picture layer of Fig. 3 may 
contain a new quantization matrix (actually the allowed quantization matrices are linear 
multiples of each other, so the picture header only needs to contain a multiplier that is used 
to obtain the currently valid qviantization matrix from a certain predefined default matrix). 

The MPEG-2 specifications introduce a so-called Virtual Buffer Verifier or VBV 
30 mechanism to control the rate of producing an encoded bitstream. The aim of the VBV is to 
ensure that it will be possible to decode the encoded bitstream with a decoder that has an 
input buffer of a certain fixed size. A virtual buffer is a hypothetical first-in-first-out buffer 
memory which is thought to be directly connected to the output of the encoder. The size of 
the virtual buffer in bits is declared in the sequence header. At the beginning of encoding a 
35 video sequence the virtual buffer is "filled" to a certain fullness which is specified in the 
bitstream. Thereafter the buffer occupancy is inspected after each picture interval before 
and after removing from the buffer the bits belonging to the picture which has been in the 
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buffer longest. Both before and after the removal of bits the number of bits in the buffer 
must remain between zero and B, where B is the size of the virtual buffer in bits. The larger 
the size of the virtual buffer, the more the number of bits produced by encoding an 
individual picture is allowed to deviate from the average. If the inspection of the virtual 
5 buffer occupancy shows an underflow, the encoded picture which was removed from the 
virtual buffer consumed too many bits: more compression must be introduced by using a 
steeper weighting matrix. An observed virtual buffer overflow shows that volume of the bit 
stream is about to fall below its defined minimum limit, which is corrected by adding 
stuffing bits to the bitstream. 

10 The problem which the present invention aims to overcome is that once the bitstream that 
carries an encoded digital video signal has been produced by the encoder, its volimie or rate 
is constant. A certain predefined transmission capacity is required for transmitting it 
between two locations, and a certain predefined storage capacity is required to store e.g. the 
complete video sequence onto a storage medium for later use. It would be advantageous if a 

15 user or other party taking part in the transmission, storage or use of the bitstream could 
adapt the volume or rate of the bitstream to the available transmission or storage capacity. 

Various known video filtering techniques can be used for simplifying a picture: for example 
it is possible to repeatedly take a number of adjacent pixels and replace them with a smaller 
number of adjacent pixels the values of which are obtained from the values of the original 

20 pixels through a certain averaging scheme. Reducing the total number of pixels in each 
picture naturally reduces the volume or rate of the bitstream which is composed of the 
pictures. Another approach is to limit the number of bits which are available to indicate the 
value(s) associated with each pixel, resulting in a reduced number of different tones in the 
picture. However, all such video filtering techniques where the filtering takes place on the 

25 pixel level require that the encoded digital video signal is completely decoded, i.e. the 
original pictures are restored before the filtering is possible, and re-encoded after the 
filtering. Decoding and re-encoding the bitstream completely just for reducing its volume or 
rate requires a considerable amount of time and other resources. 

One could propose an alternative approach for reducing the volimie or rate of a bitstream 
30 where complete pictures would be cut out from the encoded bitstream without otherwise 
decoding it. In order not to change the displaying rate the removed pictures should be 
replaced with some kind of codes that instruct the displaying apparatus to echo the previous 
picture instead or to otherwise fill the gap in the picture sequence. The drawback of this 
approach is that the addition of such codes to an already applied standard is very difficult: 
35 only new or newly reprogrammed display apparatuses would imderstand the codes 
correctly. Additionally the removal of pictures tends to cause twitching in the displayed 
video image. 
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It is an object of the present invention to provide a method and an arrangement for reducing 
the volume or rate of an encoded digital video signal. Especially it is an object of the 
invention to accomplish the volume or rate reduction essentially without requiring changes 
to the existing coding standards. It is a further object of the invention to provide such a 
5 method and arrangement so that the implementation is simple and advantageous from the 
manufacturing point of view. An additional object of the invention is that the method and 
arrangement should be easily integrated into various existing and future signal processing 
arrangements. 

The objects of the invention are achieved by partly decoding the encoded digital video 
10 signal, applying low pass filtering and/or rescaling to the partly decoded signal and re- 
encoding the result into the fully encoded form. 

The method according to the invention comprises the characteristic steps of 

- partly decoding an encoded digital video bitstream, thus producing a partly decoded digital 
video bitstream, 

1 5 - reducing the amoimt of bits in the partly decoded digital video bitstream and 

- re-encoding the partly decoded digital video bitstream in which the amount of bits is 
reduced, thus producing a re-encoded digital video bitstream, the volume or rate of which is 
smaller than that of the encoded digital video bitstream, that fulfils a certain set of 
predefined structural rules. 

20 The invention also applies to an arrangement which comprises as its characteristic features 

- means for partly decoding an encoded digital video bitstream, 

- means for reducing the amount of bits in the partly decoded digital video bitstream and 

- means for re-encoding the partly decoded digital video bitstream in which the amount of 
bits is reduced. 

25 The invention is based on the insight that an encoded digital video signal does not need to 
be decoded completely to reach a level where it is possible to produce even very large 
variations to the volume or rate of the bitstream without making fundamental changes to its 
basic structure. According to the invention the bitstream is post-processed in a form which 
is somewhere between a fully encoded and fully decoded form. The level on which the post- 
30 processing is accomplished, and the part(s) of the bitstream that are the subjected to it are 
selected so that the adverse effects introduced by the volume or rate reduction on the 
observable quality of the signal are kept imder control. The post-processing can be made 
adaptive by selecting its transfer function according to certain predefined characteristics of 
the signal. 

35 Within the MPEG-2 framework the suitable level on which the invention is applied is the 
level of DCT coefficients and their quantization. In the research which led to the invention 
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it was found that simply rescaling the DCT coefficients is not advantageous because even a 
relatively moderate level of rescaling tends to make the block boundaries visible in the 
reproduced picture. However, low pass filtering the DCT coefficients, i.e. changing the 
relative magnitudes of the spatial frequency components within a block v^th an emphasis 
on lower fi-equencies, potentially combined with rescaling, was found to produce excellent 
results. To achieve the low pass filtering the weighted, quantized and VLC encoded 
coefficient matrices contained in the original MPEG-2 bitstream are subjected to VLC 
decoding, after which the step(s) of (rescaling and) filtering are performed and the results 
are again VLC encoded. These operations are complemented by a number of supporting 
steps which ensure that after the (rescaling and), low pass .filtering and VLC re-encoding the 
MPEG-2 bitstream with reduced volume or rate can be reconstructed without violating the 
general rules governing the MPEG-2 format. 

The selection of frequency response for the low pass filter may be done by several 
altemative strategies. It has been found advantageous to use an adaptive filter the frequency 
response of which is matched to the energy content of the picture blocks either on block by 
block basis or by using some other methods of energy content analysis. Most 
advantageously the analysis of the block energy content takes separately into account the 
energy associated with the different spatial frequency directions, like horizontal, vertical 
and diagonal. 

The novel features which are considered as characteristic of the invention are set forth in 
particular in the appended Claims. The invention itself, however, both as to its construction 
and its method of operation, together with additional objects and advantages thereof, will be 
best understood from the following description of specific embodiments when read in 
connection with the accompanying drawings. 

Fig. 1 illustrates a known MPEG-2 encoder. 

Fig. 2 illustrates a knovm MPEG-2 decoder. 

Fig. 3 illustrates the known hierarchical structure of an MPEG-2 formatted bitstream. 
Fig. 4 illustrates the known zigzag reading order of DCT coefficients. 
Fig. 5 is a block diagram of an advantageous embodiment of the invention. 
Fig. 6a illustrates a certain filtering fimction. 



Fig. 6b illustrates a certain definition of directionality of DCT coefficients 




Fig. 7 illustrates the compression principle of the invention. 
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Figs. 1 to 4 have been described above in connection with prior art, so the following 
discussion wall concentrate on Figs. 5, 6a, 6b and 7. 

Fig. 5 is a block diagram of an apparatus which can be iised to reduce the volume or rate of 
an MPEG-2 encoded bitstream which appears at the input line 501. A bit stream analyzer 
5 block 502 is coupled to the input 501. It has four data outputs which are known as the 
"xmtouched" output, "DCT coefficients" output, "quantization matrices" output and the 
"virtual buffer verifier" output. Additionally the bit stream analyzer block 502 has a control 
output. At the right in Fig. 5 there is a multiplexer block 503 which has four data inputs, 
one control input and one output of which the latter is coupled to the output line 504 of the 
10 whole apparatus. The data inputs of the multiplexer block 503 have the same names as the 
data outputs of the bit stream analyzer block 502. 

The control output of the bit stream analyzer block 502 is directly coupled to the control 
input of the multiplexer block 503 and the "imtouched" output of the bit stream analyzer 
block 502 is directly coupled to the corresponding input of the multiplexer block 503. 

15 Between the "DCT coefficients" output of tiie bit stream analyzer block 502 and the 
corresponding input of the multiplexer block 503 there is the series connection of a variable 
length decoder 505, a requantization block 506, an adaptive DCT filtering block 507 and a 
variable length re-encoder block 508. Between the "quantization matrices" output of the bit 
stream analyzer block 502 and the corresponding input of the multiplexer block 503 there is 

20 an element-v^se matrix multiplier block 509 and between the "virtual buffer verifier" 
output of the bit stream analyzer block 502 and the corresponding input of the multiplexer 
block 503 there is a VBV value modifier block 510. Between the requantization block 506 
and the element-wise matrix multiplier block 509 there is a control connection. Similarly 
there are control connections firom the variable length decoder 505 and re-encoder block 

25 508 to the VBV value modifier block 5 1 0. 

The arrangement of Fig. 5 operates according to the following description. 

The bit stream analyzer block 502 performs a demultiplexing function where the VBV 
values and other virtual buffer related information are directed to the VBV value modifier 
block 510, the weighting (quantization) matrices are directed to the element- wise matrix 
30 multiplier block 509, the DCT coefficient matrices are directed to the variable length 
decoder 505 and the rest of the bitstream is dkected through the "untouched" output to the 
corresponding input of the multiplexer block 503. 

The variable length decoder 505 decodes the VLC encoded DCT coefficient matrices and 
feeds them into the requantization block 506, which applies a requantization function the 
35 aim of which is to enlarge the quantization step used in the original encoding process. 
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According to an advantageous embodiment of the invention the requantization is a simple 
division, also known as rescaling, where all DCT coefficients of each coefficient matrix are 
divided by a certain parameter which may be designated as a. For a certain reason given 
below the value of a must remain constant through all blocks for which the same weighting 
5 matrix has been used in the original encoding. The most advantageous value for the 
parameter a depends on the amount of reduction which must be achieved in the volimie or 
rate of the bitstream. The higher the value of a, the closer the requantized DCT coefficients 
tend to get to zero, which means more compression in the bitstream. The other way round, 
the closer the value of a is to one, the less compression is obtained through requantization. 
10 It has been shown that requantization easily introduces perceptible artifacts into the pictures 
(e.g. the block bovmdaries tend to become visible) which means that depending heavily on 
requantization to reduce the volimie or rate of the bitstream is not advantageous. Suitable 
values for a may be found by experimenting. The invention does not require the use of 
requantization at all, i.e. the value of a may well be 1. 

^ 15 The requantized DCT coefficient matrices are directed to the DCT filtering block 507, 

yj which applies a certain transfer function which is most advantageously of the low-pass type: 

""^ the DCT coefficients which represent the lowest spatial frequencies are preserved while the 

Q DCT coefficients which represent the higher spatial frequencies are reduced in value or 

S even zeroed. The recommendation of a generally low-pass type filtering strategy follows 

20 from the observation that it is the higher spatial frequencies that give rise to the blocking 

g artifact referred to above. The invention does not limit the actual form of the transfer 
function. We will describe some potential transfer functions in more detail. 

Q To make it easier to understand the filtering we may use a geometrical model in which the 

transfer function is first defined as a certain two-dimensional curve between the ordinate 
25 values from 0 to 7 and converted into a three-dimensional surface by rotating it aroimd the 
vertical coordinate axis by 90 degrees. Fig. 6a illustrates a surface produced by rotating the 
known gaussian curve, known also as the (1 2 1) low-pass filter response, fitted into the 
range from 0 to 7 around the vertical axis. Filtering with this transfer function means that 
the 8 X 8 integral intersection points on the horizontal plane are considered and the 
30 corresponding values on the surface are taken as the multipliers that are used to multiply the 
8x8 DCT coefficients in the DCT coefficient matrix. Table I shows the multipliers in 
^^ys^ tabular form. 

1^ 
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Another possible transfer function could be obtained by rotating a step function with a step 
from 1 to 0 at some point X (so that 0 < X < 7) around the vertical axis. This would result in 
5 a "top hat" surface with the value 1 for all ordinate points which are closer than X to the 
origin and the value 0 for all other ordinate points. Still another proposed transfer function 
could be obtained by rotating a straight descending line around the vertical axis, resulting in 
a conical transfer function surface with an upwards pointing vertex at the vertical axis. In 
practice it has been noted that using either a "top hat" transfer function or a conical one 
10 tends to introduce ghost lines and ripple into the picture. It is not requu-ed that the transfer 
function should possess any cylindrical symmetry, i.e. the "filter surface" need not be 
obtained through rotating a two-dimensional curve around the vertical axis. 

The form of the transfer function applied in the DCT filtering block 507 to low pass filter 
the spatial frequency components of the blocks is of primary importance regarding the 

15 amoimt of reduction achieved in the volume or rate of the bitstream through the use of the 
invention. In the following we consider exclusively transfer functions of the rotationally 
obtained gaussian type, although the following discussion is also applicable to arbitrary 
transfer functions. The two-dimensional gaussian curve which is used to define the filter 
surface may be scaled in the horizontal direction: squeezing it closer to the origin means 

20 that the point where the curve begins to give negligibly small values is associated aheady 
with a relatively small ordinate value, whereas stretching it away from the origin means that 
the values given by the curve remain substantially greater than zero even for relatively large 
ordinate values. The effect of the squeezing or stretching of the two-dimensional curve on 
the rotationally obtained filtering surface are easily understood: the "hill" around the 

25 vertical axis becomes either steeper (squeezing) or smoother (stretching). 

Because the DCT coefficients to be filtered are conceptually associated vsdth certain points 
on the ordinate plane, scaling is easily modelled by mapping each of said points consistently 
to another point on the ordinate plane before reading the corresponding filtering factor from 
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the filtering surface. For each point the mapping takes place along a line which goes 
through both the original point and the origin: squeezing means mapping the point farther 
away fi-om the origin and stretching means mapping it closer to the origin. It is clear that 
squeezing is synonymous to applying a tighter low pass function (leaving only relatively 
5 few lowest spatial frequencies and canceling all others) and stretching means that the low 
pass function is loosened to pass even some of the higher spatial frequencies through in 
substantial magnitude. 

The "squeezing" or "stretching" of the filter surface is more generally known as adapting 
the filtering function. According to the MPEG-2 embodiment of the mvention the filtering 

10 fimction is adapted enough to - together with the eventual effect of the rescaling described 
above - achieve the required reduction in the volume or rate of the bitstream. The required 
degree of adaptation, i.e. the amount by which the filter surface is squeezed or stretched to 
achieve a certain predefined reduction in volume or rate, may be preprogrammed to a look- 
up table which the DCT filtering block 507 consuhs after the apparatus of Fig. 5 has 

15 received a command to perform a bitstream volimie or rate reduction operation from a 
certain given input volimie or rate to a certain given output volume or rate, or the DCT 
filtering block may obtain it dynamically by starting with a certain preprogrammed default 
filtering function and using a feedback loop to change the amount of adaptation if the 
obtained output volume or rate is too high or too low. 

20 Up to this point we have described the use of the same filtering function to all DCT 
coefficients in a coefficient matrix. The invention does not contain such a limitation. 
Indeed, it has been found that by applying a differently adapted filtering function to 
horizontally, vertically and diagonally directed spatial frequencies it is possible to obtain 
very advantageous results. Fig. 6b illustrates an exemplary division of the DCT coefficients 

25 to those relating to horizontally, vertically and diagonally directed spatial frequencies. Also 
other kinds of definitions are possible, as is the use of a larger or smaller number of 
directional groups. 

Taken that a grouping into horizontally, vertically and diagonally directed spatial 
frequencies is defined, it is advantageous to define the scaling factor for the filtering 

30 fimction separately for each group. A simple way of defining the scaling factor is to take the 
DCT coefficient that represents the highest signal energy vdthin the group, and examine its 
position within the DCT coefficient matrix. The position may be represented with a variable 
P. If we are considering the group of horizontally directed spatial frequencies, let P take the 
horizontal index value of the examined DCT coefficient, hi other words, if within the group 

35 of horizontally directed spatial frequencies the highest signal energy is represented by tiie 
coefficient the location of which in the DCT coefficient matrix is (/7, jl\ let P have the 
value jl when the filtering of the horizontally directed spatial frequencies is considered. 
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Similarly, if within the group of vertically directed spatial frequencies the highest signal 
energy is represented by the coefficient the location of which in the DCT coefficient matrix 
is (/2, j2\ let P have the value i2 when the filtering of the vertically directed spatial 
frequencies is considered. If within the group of diagonally directed spatial frequencies the 
highest signal energy is represented by the coefficient the location of which in the DCT 
coefficient matrix is (z5, j3\ let P have the greater of the values i3 andyi when the filtering 
of the diagonally directed spatial frequencies is considered. 

When the value for the variable P has been found within a group of spatial frequencies 
directed to a certain direction, an advantageous way of calculating the scaling factor for the 
filtering fimction regarding that group of spatial frequencies is to divide the value of P by a 
certain number which may be constant or which may be obtained from a look-up table 
relating to a required compression ratio as described above. The divisor used to divide the 
value of P may also be dynamically adapted by using feedback that describes the relation 
between the obtained and required compression ratio. For the adaptive filtering to be 
effective it is advantageous to select the divisor so that relatively high values of P cause 
stretching and relatively low values of P cause squeezing to the filtering fimction. 

Other advantageous ways of finding the correct adaptation to the filtering fimction 
associated to a certain group of spatial frequencies directed to a certain direction are the 
calculations of the variance or mean absolute error s of signal frequencies represented by 
the DCT coefficients belonging to the group. The variance is calculated according to the 
formula 

and the mean absolute error s is calculated according to the formula 
s = -'Z\DCT(i,j)-5^ 

^ iJ 

where n is the number of DCT coefficients in the group, DCT(i, j) is the DCT coefficient at 
location (z, j) within the DCT coefficient matrix, x is the mean value of the DCT 
coefficients in the group and the summing over / and j extends through the group. The 
variance or mean absolute error may be used as such as the scaling factor for the filtering 
fimction associated vsdth that group, or it may be divided or multiplied by a number which is 
defined similarly as the divisor of the value P described above. 

Let us complement the description of the adaptive filtering fimction with an example. We 
consider the filtering fimction which is used to filter the group of horizontally directed 
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spatial frequencies (with the group defined as in Fig. 6b) first in a case where the block to 
be filtered is found to contain high horizontal activity and then in a case where the block to 
be filtered is found to contain low horizontal activity. Table H shows the two top rows of 
the original filtering matrix which is used to describe the filter surface in the actual filtering 
5 operation. These are the same as the two top rows of Table I above. 



Table II 



1 


0,95 


0,81 


0,61 


0,39 


0.19 


0,05 


0 


0,95 


0,9 


0,77 


0,57 


0,36 


0,17 


0,04 


0 



Note that the leftmost column does not affect the filtering of horizontally directed spatial 
frequency components, since the top value there corresponds to the DC coefficient and the 
1 0 lower value corresponds to the topmost coefficient the vertical group. Let as assume that in 
a block where high horizontal activity is found the largest coefficient is at a horizontal 
location 6, which becomes the value of P. This is a relatively large value of P, so stretching 
is caused. The resulting top rows of the modified filtering matrix may look like Table III. 



Table III 



1 


0,97 


0,95 


0,88 


0,81 


0,71 


0.61 


0,50 


0,96 


0,93 


0.90 


0.84 


0,77 


0,67 


0,58 


0,47 



15 Let us then make an alternative assumption according to which the block to be filtered 
comprises only low horizontal activity: within the group of horizontal spatial frequencies 
the largest coefficient is at a horizontal location 1. This is a relatively small value of P, so 
squeezing is caused. The resulting top rows of the modified filtering matrix may look like 
Table IV. 

20 Table IV 



1 


0,61 


0,05 


0 


0 


0 


0 


0 


0,61 


0,58 


0,04 


0 


0 


0 


0 


0 



As an alternative to the groupwise adaptation we may present a simpler embodiment of the 
invention in which the whole DCT coefficient matrix is treated as a single group where the 
largest coefficient value is found at location {i4,j4). The value of P is selected as the greater 
25 of the indices i4 and j4 when the filtering of all spatial frequencies is considered. 

After each group of frequencies has been filtered with the transfer fiinction the adaptation of 
which has been separately calculated for each group (or with the same transfer fimction for 
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all, if the separately adapted transfer functions are not used), the filtered DCT coefficient 
matrix is produced by inserting into an 8 x 8 matrix all the DCT coefficients obtained 
through elementwise multiplication between the original (eventually requantized) DCT 
coefficients and the corresponding elements in the filtering matrix. 

The filtered DCT coefficient matrices are directed to the variable length re-encoder block 
508 which re-encodes them preferably according to exactly the same method which is 
employed in the known MPEG-2 encoders. The re-encoded result comprises less bits than 
the stream of original VLC encoded DCT coefficient matrices fed into the variable length 
decoder 505, because the filtering function of block 507 has produced longer runs of zeroes 
in the matrices. 

It should be noted that the invention does not require the requantization block 506 and the 
DCT filtering block 507 to be located in this order, hi other words, in an alternative 
embodiment of the invention the output of the VLC decoder block 505 is coupled to the 
input of the variable length re-encoder block 508 through a DCT filtering block and a 
requantization block in this order. 

We will now move on to describe the role of the element-wise matrix multiplier block 509 
which is located between the "quantization matrices" output of the bit stream analyzer 
block 502 and the corresponding input of the multiplexer block 503. Previously we have 
stated that the value of a used in the requantization block 506 to requantize the DCT 
coefficients must remain constant through all blocks for which the same weighting matrix 
has been used in the original encoding. The reason for this is that the overall tone level 
reducing effect of the requantization must be compensated for by multiplying the 
corresponding weighting matrix by the same factor which was used to divide the DCT 
coefficients in the requantization. Therefore the bit stream analyzer 502 takes the 
information related to the weightmg matrices firom the original bitstream the volume or rate 
of which should be reduced, and runs it through the element-wise matrix multiplier block 
509. The latter gets fi-om the requantization block 506 the value of a which was used in the 
requantization, and modifies the weighting matrix information accordingly: if the weighting 
matrix coefficients are transmitted as such in the picture header, the element-wise matrix 
multiplier block 509 multiplies them with the obtained value of a. If the allowed 
quantization matrices are linear multiples of each other and the picture header only contains 
a multiplier that is used to obtain the currently valid quantization matrix from a certain 
predefined default matrix, the element-wise matrix multiplier block 509 multiplies the 
multiplier with the obtained value of a, 

Next we will briefly discuss the operation of the VBV value modifier block 510. Its task is 
simply to ensure that the VBV values in the modified bitstream are in accordance with the 
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VBV specifications known from the MPEG-2 standards. The VBV value modifier block 
510 gets control information both from the VLC decoder block 505 and the variable length 
re-encoder block 508 so that it is aware of the exact amount of reduction achieved in the 
volume or rate of the bitstream. It modifies the VBV values obtained from the bitstream 
5 analyzer 502 so that the reduction in the amount of bits associated with each block is 
correctly reflected by the modified VBV values. 

We will conclude Fig. 5's operational description by briefly describing the operation of the 
multiplexer block 503. Its task is to reconstruct the bitstream from the components it 
receives from the bitstream analyzer 502, the variable length re-encoder block 508, the 

10 element-wise matrix multiplier block 509 and the VBV value modifier block 510. It 
receives from the bitstream analyzer 502 the necessary syncronization information with 
which it is able to reconstruct the bitstream so that the various delays caused by the 
processing operations in blocks 505 to 510 do not destroy the temporal relations of the 
bitstream components. Outputting, through the output line 504, the final modified bitstream 

15 the volume or rate of which has been reduced may take place in complete synchronization 
with the reading of the input stream through line 501 (for example, if a reduction of exactly 
50% has been achieved, the output clock may be the input clock divided by two), or the 
input and output may be completely out of synchronization. The latter alternative is 
probably the most advantageous, because the achieved reduction is seldom an exact fraction 

20 of the input volume or rate. 

Fig. 7 is a flow diagram that illustrates the principle of compressing an encoded digital 
video bitstream according to the invention. The top and bottom rows in Fig. 7 are known 
from prior art, and the invention relates to the middle row. A graphical image is mapped 
into pixels at step 701 by a digital video camera or a corresponding apparatus. An inner 

25 coding 702 is performed, which in the MPEG-2 system corresponds to the DCT encoding 
phase. After that an outer encoding 703 is performed; in the MPEG-2 system this 
corresponds to the weighting, quantization and VLC encoding of the DCT coefficient 
matrices. The compression in accordance with the invention consists of decoding the outer 
encoding at step 704, compressing the partly encoded image data at step 705 and restoring 

30 the outer coding at step 706. After that the compressed, encoded digital video bitstream may 
be led e.g. to a displaying apparatus where the outer encoding is decoded at step 707, the 
inner encoding is decoded at step 708 and the raw image data is mapped into pixels on a 
display screen at 709. Various storing, transmitting and receiving steps as well as 
encapsulations of the encoded digital video bitstream into transport containers like IP 

35 (Internet Protocol) datapackets or ATM (Asynchronous Transfer Mode) cells and 
decapsulations from them may take place between the steps shown in Fig. 7. 
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The compression of the encoded digital video bitstream may take place at an arbitrary 
location between the source of the bitstream and its displaying. An advantageous 
application of the invention is to compress the encoded digital video bitstreams that are to 
be transmitted as a part of a video telephone connection or an Internet connection over a 
cellular radio network. It should be noted that the known and proposed handheld mobile 
stations through which a cellular video telephone call or a cellular Internet connection 
would be established invariably comprise a rather small-sized display which is not capable 
of reproducing a digital video image with the same resolution and fidelity as e.g. a large TV 
screen or a tabletop computer. Therefore it is in many cases very advantageous to compress 
an encoded digital video bitstream before transmitting it over the radio interface to such a 
mobile station, because a remarkable reduction may be achieved in the required amount of 
radio resources and because the limited displaying capabilities of the mobile station would 
make it difficult anyway to utilize all the detailed information contained in the original 
encoded digital video bitstream. The mobile station and the network may even negotiate 
about the capabilities of the mobile station and the availability of radio resources at the 
setup phase of a video telephone connection or a cellular Internet connection so that the 
network will compress the original encoded digital video bitstream to a volume or rate that 
is both compatible with the mobile station's capability and transmittable over the radio 
interface. ^ 
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